Noah A. Smith
نویسندگان
چکیده
Modulo formatting, this document constitutes a portion of my application for tenure (“section 5”). It details my research, teaching, and service efforts and goals. It is not exhaustive, and it reflects my view of my activities in late May 2013. 1 Research My research goal is to automate inference from natural language text, including: • algorithms that interpret text into abstract linguistic structures (§1.1); • learning algorithms that infer the parameters of such models from text annotated by linguistic experts or non-experts, unannotated text, and text observed in its social context (§1.2); • task-specific inferences in the form of translation into other languages or real-world predictions (§1.3). The following elaborates some important aspects of each in turn, emphasizing developments since my review for promotion in 2010 and plans for the future. Some research threads that are not being actively pursued are suppressed for space (e.g., work on text-to-text transformations [Das and Smith, 2009; Heilman and Smith, 2010], including my former Ph.D. student Michael Heilman’s dissertation project; and dynamic programming extensions [Cohen et al., 2008b; Gimpel and Smith, 2009a]). 1.1 Computational Linguistics and Natural Language Processing Making inferences from a text might require analyzing its meaning; this conjecture underlies much current research in translation, question answering, and information extraction. Operationalizing “meaning” remains a central challenge in computational linguistics. Many agree that syntactic parsing is an important first step toward mapping strings to meanings (see Fig. 1, top, for an example). Further, representations of the propositional semantics of strings have attracted a great deal of attention (i.e., “who did what to whom?” and illustrated in Fig. 1, bottom). My NLP research seeks algorithms for text analysis, subject to the oftencompeting desiderata listed in Tab. 1. I call this research program linguistic structure prediction, which is also the title of my monograph [Smith, 2011]. Much of my early work used exact dynamic programming algorithms that find the model-optimal1 linguistic analysis of a piece of text [Cohen et al., 2008b, 2011b; Dreyer et al., 2006; Eisner and Smith, 2011; Eisner et al., 2004a,b, 2005; Smith and Smith, 2004, 2007; Smith and Johnson, 2007; Smith et al., 2005]. Classic examples are the Viterbi algorithm [Viterbi, 1967] and the weighted version of Earley’s algorithm [Earley, 1970]. These algorithms remain foundational in NLP, but using them requires us to make strong assumptions about the mappings of words to deeper linguistic structures; hence, they can be seen as weak on desideratum (iii). 1.1.1 Approximating Structured Inference: AD3 Unfortunately, inference with guarantees of model-optimality quickly becomes intractable as our models become more linguistically expressive. For example, we might wish to capture non-local interactions between two or more syntactic or semantic arguments of a word [Das et al., 2012], different parts of a translation [Gimpel and Smith, 2009a], or between morphological (word-internal) and syntactic analyses [Cohen and Smith, 2007]. My former Ph.D. student André Martins, his co-advisors Mário Figueiredo (IST), Pedro Aguiar (IST), and Eric Xing (CMU), and I developed a range of general approximate inference techniques Briefly, such algorithms imply the use of a statistical model, typically a weighted grammar. By “model-optimal,” we mean that the algorithm is guaranteed to find the analysis that the model deems most probable. In statistical terms, this is a kind of maximum a posteriori inference: argmaxy∈Yx score(x,y), where Yx is the set of well-formed analyses of an input string x and score is a function learned from data, often a probability (see §1.2).
منابع مشابه
PLOS ONE: Diffusion of Lexical Change in Social Media
Introduction Materials and Methods Results Discussion Supporting Information Acknowledgments Author Contributions References Reader Comments (0) Figures ADVERTISEMENT Diffusion of Lexical Change in Social Media 1,534 VIEWS 3 SAVES 57 SHARES OPEN ACCESS PEER-REVIEWED RESEARCH ARTICLE Jacob Eisenstein , Brendan O'Connor, Noah A. Smith, Eric P. Xing
متن کامل2010 Senior Thesis Project Reports Advisor: Kemal Oflazer and Noah Smith Rich Entity Type Recognition in Text Senior Thesis Table of Contents
This technical report collects the final reports of the undergraduate Computer Science majors from the Qatar Campus of Carnegie Mellon University who elected to complete a senior research thesis in the academic year 2009–10 as part of their degree. These projects have spanned the students’ entire senior year, during which they have worked closely with their faculty advisors to plan and carry ou...
متن کامل